Nature Genetics
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
Polygenic scores (PGS) predict complex traits and stratify disease risk but often fail to fully capture individual-level variation. "Misaligned" individuals, whose observed phenotypes deviate from their genetically expected values based on polygenic scores (PGS), provide a powerful model for identifying factors beyond common-variant effects, including additional genetic factors. Here, we apply misalignment classification and enrichment testing frameworks to seven continuous and three dichotomous...
Show abstract
Mapping the pleiotropic effect of genetic variation on biological processes and complex phenotypes is fundamental to extracting translational insight from genome-wide association studies (GWAS). Here we present The Human Genotype-Phenotype Map (GPMap), a repository of colocalizing genetic associations across 15,997 complex traits and 2.7 million molecular measurements, leveraging common and rare variants and cis-and trans-acting effects across disaggregated tissue types and single cell datasets ...
Show abstract
Genome-wide association studies (GWAS) have implicated tens of thousands of genetic variants associated with complex traits and polygenic diseases. Colocalizing GWAS variants with variants that may regulate gene expression, via expression quantitative trait loci (eQTL) mapping, has successfully led to the identification of disease-critical genes and their cell types of action. Recent studies predominantly colocalize proximal cis-eQTLs, which are estimated to regulate [~]10% of variance in gene e...
Show abstract
1Variation in ribosomal DNA (rDNA) copy number influences diverse physiological traits in model organisms, yet its consequences for human health remain poorly characterized. Here, we provide the largest analysis of 45S rDNA copy number to date and the first population-scale characterization of 5S using whole-genome sequencing from 490,383 UK Biobank participants. Despite encoding components of the same molecular machine, these arrays vary independently and associate with divergent phenotypes. Hi...
Show abstract
Many non-coding variants influence complex traits and diseases through gene regulation, yet the mechanisms linking these variants to downstream biology remain poorly understood. Here, we present eQTLGen Phase 2, a comprehensive genome-wide analysis of gene expression quantitative trait loci (eQTLs) in 43,301 blood samples from 52 datasets. Beyond local cis-effects, this sample size enabled the first systematic mapping of trans-eQTLs at scale. We identify cis-eQTLs for nearly all expressed genes ...
Show abstract
Methods that analyze single-cell RNA-seq+ATAC-seq multiome data have shown promise in linking enhancers to target genes by correlating chromatin accessibility with gene expression across cells. However, correlations among ATAC-seq peaks may induce non-causal tagging peak-gene links (analogous to tagging associations in GWAS); indeed, we confirm that tagging effects induced by peak co-accessibility are pervasive in peak-gene linking. We defined two scores for each ATAC-seq peak: co-accessibility ...
Show abstract
Detecting low variant allele fraction (VAF) mosaic variants without matching controls remains a major challenge in genomics, limited by technical noise, lack of benchmarks, and computational scalability. We present the DRAGEN mosaic caller, a hardware-accelerated approach identifying variants down to [~]1-2% VAF with low false-positive rates and hour-scale runtimes for mosaic SNV/indel detection from bulk sequencing. To support evaluation, we introduce a genome-wide low-VAF benchmark for variant...
Show abstract
Allele-specific expression (ASE) outlier detection is a powerful tool for identifying genes affected by large effect rare genetic regulatory variants but suffers from data sparsity and noisy signal in low-count genes. Genome phasing can be utilized to aggregate ASE signal along haplotypes to alleviate both sparsity and noise. Yet statistical tools for utilizing haplotype-level ASE data for rare variant interpretation are lacking. Here, we present ANEVA-h, to quantify the amount of genetic variat...
Show abstract
BackgroundAttention-deficit/hyperactivity disorder (ADHD) is a common heritable neurodevelopmental disorder, affecting [~]7 million children (11.4%) in the U.S. However, ADHDs underlying genetic architecture remains largely unknown. Transcriptome-wide association studies (TWAS), which integrate expression quantitative trait loci (eQTL) and GWAS summary data, can identify differentially expressed risk genes underlying complex phenotypes. Here we conduct a TWAS of ADHD using expression data from m...
Show abstract
Digenic alterations can produce phenotypes such as synthetic lethality or digenic disease that are not observed upon individual gene perturbation, often by disrupting compensatory or redundant biological mechanisms. We hypothesized that gene pairs underlying such phenotypes share, when considered jointly, biological network properties analogous to those of essential genes or monogenic Mendelian disease genes. To test this hypothesis, we developed PAGAN, a graph representation learning framework ...
Show abstract
Most current GWAS-eQTL approaches prioritize genes whose mediating effects on complex traits act through cis-regulation, while trans-acting genes remain largely underexplored. Recent perturbational screening technology provides a novel approach to quantifying trans-effects between gene pairs, but its integration with GWAS data remains largely unexamined. We introduce Mr. PEG, a novel framework that integrates perturbational screens, eQTL, and GWAS summary data to identify mediating genes of comp...
Show abstract
Understanding genetic architectures of disease is fundamental to partitioning heritability, polygenic risk prediction, and statistical fine-mapping. Genetic architectures of disease in European populations have been shown to depend on European minor allele frequency (MAF): SNPs with lower MAF have larger per-allele effects, due to the action of negative selection. However, we hypothesized that African MAF (defined using African-ancestry segments in African Americans), which is not distorted by t...
Show abstract
BackgroundGenetic studies have disproportionately focused on populations of European ancestry, limiting the generalizability of allele-frequency references and genetic associations to underrepresented groups, including South American populations. This gap is particularly relevant for rare diseases and cancer, where accurate variant interpretation depends in part on appropriate population context. In addition, population-specific haplotype structure influence genome-wide association analyses and ...
Show abstract
BackgroundAttention-deficit/hyperactivity disorder (ADHD) and migraine are prevalent neurodevelopmental and neurological conditions, respectively, that contribute to individual disability and social burden. The biological mechanisms linking these disorders remain poorly understood. MethodsWe aimed to investigate their shared genetic architecture by integrating genomic data with a cross-trait analysis using the largest genome-wide association studies (GWAS) for ADHD and migraine to date. Variant...
Show abstract
Functional interpretation is essential for understanding how genetic variants contribute to complex traits. Here, we identified and characterized regulatory variants in CD4+ T cells collected from 362 donors. We integrated molecular QTL mapping from single-cell RNA-seq profiles and chromatin accessibility with predicted variant effects from a deep learning model trained on chromatin accessibility data. We identified molecular features and transcription factor binding mechanisms underlying varian...
Show abstract
Sinonasal squamous cell carcinoma (SNSCC) is an aggressive head and neck cancer of the sinonasal cavity which has not benefitted from therapeutic advances over decades1. Though historically attributed to inhaled carcinogens such as hardwood dust and tobacco smoking2, SNSCC is incidentally associated with human papillomavirus (HPV)3,4. Importantly, HPV is the primary oncogenic driver of >80% of anatomically adjacent oropharyngeal cancers5. While viral status drives clinical staging and treatment ...
Show abstract
BackgroundThe 9p21.3 locus was the first genome-wide significant signal for coronary artery disease (CAD) and replicates across multiple non-African populations, yet is absent in African ancestry cohorts. We hypothesized that ancestry-specific linkage disequilibrium (LD) and haplotype structure, rather than allele frequency or power alone, explain this discrepancy. MethodsWe analyzed multi-ancestry data from European, East Asian, South Asian, Middle Eastern, African, and Admixed American groups...
Show abstract
BackgroundInflammatory Bowel Disease (IBD) is characterized by chronic intestinal inflammation and is associated with both altered gut microbiome composition and host genetic risk. Both host genetic variants and the gut microbiome can affect host gene expression in the colon; however, it remains unclear whether interactions between the two (genotype x microbiome, GxM) shape intestinal gene regulation in humans and their contribution to IBD risk. MethodsWe analyzed publicly available data for 86...
Show abstract
Recent studies showed that expression QTLs, even from trait-related tissues, explained a small fraction of complex trait heritability. A natural strategy to close this gap is to incorporate molecular QTLs (molQTLs) beyond gene expression, across diverse tissue/cellular contexts. Yet, integrating such QTL data presents analytical challenges. Molecular traits often share QTLs or have QTLs in high LD, complicating the attribution of GWAS signals to specific molecular traits. Our simulations showed ...
Show abstract
Structural variants (SVs) are a major source of genomic diversity and disease susceptibility; however, populations from the Middle East and North Africa (MENA) region remain critically underrepresented in global reference databases. We provide the first detailed catalogue of structural variation in 61 individuals from diverse MENA countries, using publicly available ultra-long Oxford Nanopore sequencing. A scalable and dual-reference alignment-based method (GRCh38 and T2T-CHM13) was employed to ...